******Sigman and Lindberg 2017*****
*****Democracy for All*****
**Coder-level data analysis for Tables 5-7**

**Version: July 2017**


clear
use "v-dem_coder1.dta"


**\Tables 5, 6 and 7: Variance Components Analysis for Content Validation\**

set more off, permanently
log using "Tables5and6.smcl", replace

sum  v2clacjust v2clsocgrp v2clsnlpct v2dlencmps v2dlunivl v2peedueq v2pehealth v2pehealth v2pepwrses v2pepwrsoc v2pepwrgen

*Rescale all coder scores 0-1
gen v2clacjust_n = (v2clacjust-0)/(4-0)
gen v2clsocgrp_n = (v2clsocgrp-0)/(4-0)
gen v2clsnlpct_n = (v2clsnlpct-0)/(100-0)
gen v2dlencmps_n = (v2dlencmps-0)/(4-0)
gen v2dlunivl_n = (v2dlunivl-0)/(5-0)
gen v2peedueq_n = (v2peedueq-0)/(4-0)
gen v2pehealth_n = (v2pehealth-0)/(4-0)
gen v2pepwrses_n = (v2pepwrses-0)/(4-0)
gen v2pepwrsoc_n = (v2pepwrsoc-0)/(4-0)
gen v2pepwrgen_n = (v2pepwrgen-0)/(4-0)


save "v-dem_coder2.dta", replace

***Table 5***
*The Coder Effects Estimate and SE (first two columns in Table 5) will appear below the long list of country- and year- likelihoods in the box labeled "Random-effects Parameters".   They are in the row labeled "coder_id: Identity - var(cons)."   
*The number of observations and the number of groups (second two columns in Table 5) appear on the righthand side of the Stata output and are labeled "Number of obs" and "Number of groups".  Groups represents the number of unique coder-ids.

**EqProtec
mixed v2clacjust_n i.country_id i.year || coder_id:
*Coder effects "var(_cons)" = .0581978, SE = .002767
*Coders = 1,070; Observations = 70,698

mixed v2clsocgrp_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0739381, SE = .0033405
*Coders = 1,178; Observations = 85,923

mixed v2clsnlpct_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0647591, SE = .0035029
*Coders = 828; Observations = 51,948

**EqDist
mixed v2dlencmps_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0506225, SE = .0023645
*Coders = 1,138; Observations = 82,823
 
mixed v2dlunivl_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0648738, SE = .0030134
*Coders = 1,137; Observations = 82,847

mixed v2peedueq_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0471302, SE = .0022193
*Coders = 1,145; Observations = 83,059

mixed v2pehealth_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0437318, SE = .0020848
*Coders = 1,133; Observations = 82,439


*EqAcc
mixed v2pepwrses_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0424155, SE = .0019714
*Coders = 1,150; Observations = 83,063

mixed v2pepwrsoc_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0493238, SE = .0022775
*Coders = 1,139; Observations = 82,869

mixed v2pepwrgen_n i.country_id i.year || coder_id:
*Coder effects (var(_cons) = .0366731, SE = .0017333
*Coders = 1,147; Observations = 83,425




***Table 6***
*The Coder Effects Estimate and SE (first two columns in Table 6) will appear below the long list of country- and year- likelihoods in the output. The table is labeled "Random-effects Parameters".   They are in the row "coder_id: Identity - var(cons)."   
*The Indicator-level effects estimate and SE (third and fourth columns in Table 6) will appear below the coder-level effects.  They are in the row marked var(Residual)
*The number of coders (fifth column in Table 6) appear at the beginning of the output in the "Group Variable" table in the row marked "coder-id." The number of observations (6th column) is above and to the right of the "Group Variable" table observations er of obs" and "Number of groups". 

* Pooled model Equal Protection
keep country_id historical_date year coder_id v2clacjust_n v2clsocgrp_n v2clsnlpct_n
preserve 
  reshape long v2,i(country_id coder_id historical_date year) j(ind) string
  encode ind,gen(indnumb)
  drop if v2==.
  mixed v2 i.country_id i.year || _all: R.indnumb || coder_id:  
restore

*Observations = 208,569
*Coders (Group Variable - coder_id) = 1,192
*Coder effects (var(_cons)) = .0426726,  SE = .0019449
*Indicator effects var(Residual)) = .0670433,	SE = .0002083

* Pooled model Equal Distribution
clear 
use "v-dem_coder2.dta"

keep country_id historical_date year coder_id v2dlencmps_n v2dlunivl_n v2peedueq_n v2pehealth_n 
preserve 
  reshape long v2,i(country_id coder_id historical_date year) j(ind) string
  encode ind,gen(indnumb)
  drop if v2==.
  mixed v2 i.country_id i.year || _all: R.indnumb || coder_id:  
restore

*Observations = 331,168
*Coders (Group Variable - coder_id) = 1,654
*Coder effects (var(_cons)) = .0377732,  SE = .0014342
*Indicator effects var(Residual)) = .058746,	SE = .0001448


* Pooled model Equal Access
clear 
use "v-dem_coder2.dta"
keep country_id historical_date year coder_id v2pepwrses_n v2pepwrsoc_n v2pepwrgen_n
preserve 
  reshape long v2,i(country_id coder_id historical_date year) j(ind) string
  encode ind,gen(indnumb)
  drop if v2==.
  mixed v2 i.country_id i.year || _all: R.indnumb || coder_id:  
restore

clear 
use "v-dem_coder2.dta"

*Observations = 249,357
*Coders (Group Variable - coder_id) =1,160
*Coder effects (var(_cons)) = .0232301,  SE = .0001083
*Indicator effects var(Residual)) = .0508512,	SE = .0001444

*Pooled model Egal Component
keep country_id historical_date year coder_id v2clacjust_n v2clsocgrp_n v2clsnlpct_n v2dlencmps_n v2dlunivl_n v2peedueq_n v2pehealth_n v2pepwrses_n v2pepwrsoc_n v2pepwrgen_n
preserve 
  reshape long v2,i(country_id coder_id historical_date year) j(ind) string
  encode ind,gen(indnumb)
  drop if v2==.
  mixed v2 i.country_id i.year || _all: R.indnumb || coder_id:  
restore
log close
translate "Tables5and6.smcl" "Tables5and6.log", replace

*Observations = 789.094
*Coders (Group Variable - coder_id) = 1,936
*Coder effects (var(_cons)) = .0036424,  SE = .0010163
*Indicator effects var(Residual)) = .0687577,	SE = .0001096

***Table 7***

***Predicting with coder characteristics***
******VARIANCE COMPONENTS***(The bottom part of Table 7)
**For each model, Indicator-level and country-level effects and standard errors are displayed in the output box labeled "Random-effects Parameters"**
**indnumber: Identity = Indicator level effects
**country_id: Identity = Country-level effects
**Number of coders are repeated from Tables 5 and 6.


log using "Table7.smcl", replace
clear
use "v-dem_coder2.dta"

*Model 1: EqProtec 
clear
use "v-dem_coder2.dta"
gen phd=v2zzedlev>8 if v2zzedlev<.
gen gov=employ==2 if employ<.
keep country_id coder_id historical_date year v2clacjust_n v2clsocgrp_n v2clsnlpct_n phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred
preserve
 reshape long v2,i(country_id coder_id historical_date phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred) j(ind) string
 encode ind,gen(indnumb)
 drop if v2==.
 mixed v2  phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred || indnumb: || country_id:, robust  
restore

*Model 2: EqDist
clear
use "v-dem_coder2.dta"
gen phd=v2zzedlev>8 if v2zzedlev<.
gen gov=employ==2 if employ<.
keep country_id coder_id historical_date year v2dlencmps_n v2dlunivl_n v2peedueq_n v2pehealth_n  phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred
preserve
 reshape long v2,i(country_id coder_id historical_date phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred) j(ind) string
 encode ind,gen(indnumb)
 drop if v2==.
 mixed v2  phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred || indnumb: || country_id:, robust 
restore

*Model 3: EqAcc
clear
use "v-dem_coder2.dta"
gen phd=v2zzedlev>8 if v2zzedlev<.
gen gov=employ==2 if employ<.
keep country_id coder_id historical_date year v2pepwrses_n v2pepwrsoc_n v2pepwrgen_n phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred
preserve
 reshape long v2,i(country_id coder_id historical_date phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred) j(ind) string
 encode ind,gen(indnumb)
 drop if v2==.
 mixed v2  phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred || indnumb: || country_id:, robust 
restore

*Model 4: Egal
 
clear
use "v-dem_coder2.dta"
gen phd=v2zzedlev>8 if v2zzedlev<.
gen gov=employ==2 if employ<.
keep country_id coder_id historical_date year phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred v2clacjust_n v2clsocgrp_n v2clsnlpct_n v2dlencmps_n v2dlunivl_n v2peedueq_n v2pehealth_n v2pepwrses_n v2pepwrsoc_n v2pepwrgen_n 
preserve
 reshape long v2,i(country_id coder_id historical_date phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred) j(ind) string
 encode ind,gen(indnumb)
 drop if v2==.
 mixed v2  phd gov v2zzgender v2zzfremrk v2zzreside v2zztimein v2zzcurred || indnumb: || country_id:, robust  /* THIS TOOK THREE HOURS TO RUN! */
restore

log close
translate "Table7.smcl" "Table7.log", replace

